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Abstract 

In diploid organisms, selfing reduces the efficiency of selection in removing deleterious mutations from a population. This need not be 
the case for all organisms. Some plants, for example, undergo an extreme form of selfing known as intragametophytic selfing, which 
immediately exposes all recessive deleterious mutations in a parental genome to selective purging. Here, we ask how effectively 
deleterious mutations are removed from such plants. Specifically, we study the extent to which deleterious mutations accumulate in a 
predominantly selfing and a predominantly outcrossing pair of moss species, using genome-wide transcriptome data. We find that 
the selfing species purge significantly more nonsynonymous mutations, as well as a greater proportion of radical amino acid changes 
which alter physicochemical properties of amino acids. Moreover, their purging of deleterious mutation is especially strong in 
conserved regions of protein-coding genes. Our observations show that selfing need not impede but can even accelerate the removal 
of deleterious mutations, and do so on a genome-wide scale. 

Key words: high throughput sequencing, haploid, diploid, haploid-dominant life cycle, intragametophytic selfing, outcrossing, 
deleterious mutations. 



Introduction 

Heritable genetic variation, the substrate of natural selection 
(Maynard Smith and Szathmary 1995), is ultimately caused by 
mutations. Because few mutations are beneficial and most are 
deleterious when they first arise, populations of organisms 
carry a genetic load of deleterious mutations in their genomes. 
Whether natural selection can remove such deleterious muta- 
tions from a population depends on various factors, such as 
how strongly deleterious the mutations are, how large the 
population is, whether an organism is diploid or haploid, 
and whether it reproduces sexually or asexually (Glemin 



2007; Otto and Gerstein 2008; Charlesworth and Willis 
2009; Glemin and Galtier 2012). Among sexually reproducing 
organisms, the mating system — whether organisms repro- 
duce primarily through selfing or outcrossing — is a key 
factor that determines how effectively deleterious mutations 
can be removed. 

On the one hand, selfing reduces a population's effective 
size A/ e , and thus selection's ability to remove weakly delete- 
rious mutations (Charlesworth and Wright 2001 ; Wright et al. 
2008; Wright et al. 2013). In particular, in organisms with a 
diploid-dominant life phase, selfing increases genome-wide 
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homozygosity, which in turn reduces the effective number of 
independently sampled gametes in a population, an effect 
that can cut the effective population size in half (Nordborg 
2000). In addition, high levels of homozygosity cause low ef- 
fective recombination rates (Charlesworth and Wright 2001) 
that further decrease N e and promote the accumulation of 
deleterious mutations (Felsenstein 1 974). Finally, the very pop- 
ulation structure of selfers, which is characterized by small 
subpopulations in which founder events can be frequent, 
can also reduce the effective population size (Charlesworth 
and Wright 2001). 

On the other hand, selfing can facilitate the purging of 
recessive deleterious mutations (Glemin 2007; Glemin and 
Galtier 2012). Such mutations can be completely masked in 
diploids if they are paired with a wild-type allele in a hetero- 
zygote. Selfing increases the incidence of homozygotes where 
two recessive deleterious alleles are paired (Charlesworth and 
Willis 2009). It can thus expose deleterious mutations and 
facilitate their purging. 

Both theoretical and experimental evidence suggest that 
the A/ e -reducing effects of selfing are usually stronger than 
the enhanced exposure of deleterious mutations caused by 
selfing (Bustamante et al. 2002; Slotte et al. 2010, 2013; 
Qiu et al. 2011; Ness et al. 2012; Hazzouri et al. 2013). In 
other words, selfers with a diploid-dominant life cycle should 
overall remove deleterious mutations less well than outcros- 
sers (Charlesworth and Wright 2001; Glemin 2007). 

However, this may not be the case in plants with a haploid- 
dominant life cycle or with a free-living haploid phase for two 
reasons. First, theory suggests that effective population size 
reduction caused by selfing should be less pronounced in hap- 
loid-dominant than in diploid-dominant plants because their 
life cycle poorly fit the assumptions underlying Wright-Fisher 
populations (Sten0ien and Sastad 2001; Bengtsson and 
Cronberg 2009). More specifically, in bryophytes, the auto- 
matic 2-fold reduction of N e caused by selfing does not oper- 
ate. N e is thus primarily reduced by a decrease in effective 
recombination rates (Hedrick 1987a, 1987b). 

In addition to that, plants with an independent haploid 
phase or with a haploid-dominant life cycle can undergo 
"intragametophytic selfing," which can be more effective in 
purging deleterious mutations than regular selfing in diploid- 
dominant plants. Intragametophytic selfing occurs between 
genetically identical male and female gametes produced by 
the same gametophyte (bisexual gametophyte) (fig. 1; Shaw 
2000). It leads to a complete loss of heterozygosity in merely 
one generation (Shaw 2000; Barner et al. 201 1 ; Billiard et al. 
2012). In the resulting diploids, all deleterious mutations are 
exposed to selection. In this way, intragametophytic selfing 
can facilitate the purging of deleterious mutations (Eppley 
et al. 2007). This extreme form of selfing is not possible in 
plants with unisexual gametophytes, where female and male 
gametes are always produced on genetically separate haploid 
individuals. Such plants must undergo outcrossing or 



intergametophytic selfing — a less extreme form of selfing 
among siblings produced by the same diploid (an analogous 
situation to selfing in a hermaphroditic flowering plant, see 
fig. 1), which reduces heterozygosity only by a factor one-half 
per generation (Charlesworth and Wright 2001). 

Taken together, the available body of theory (Hedrick 
1987a, 1987b; Holsinger 1987) suggests that haploid- 
dominant plants with frequent intragametophytic selfing 
might purge their genetic load more effectively than outcros- 
sers. This prediction is supported by some indirect experimen- 
tal evidence, for example, less severe inbreeding depression 
in plants capable of intragametophytic selfing (Klekowski 
1973, 1982; Lloyd 1974; Taylor et al. 2007; Eppley et al. 
2007). 

However, the proportion of the genome subject to purging 
via homozygous exposure is not known. Yet, this information 
is crucial to understand the extent at which intragametophytic 
selfing can affect the accumulation of deleterious mutations 
on a genome-wide scale. If we assume that only diploid-spe- 
cific genes are subject to purging, then the genome-wide 
effect of intragametophytic selfing on the accumulation of 
deleterious mutations should be minor (Szovenyi et al. 
201 1). This is because only very few genes are diploid-specific 
(2-3%; Szovenyi et al. 201 1, 2013) and because genes also 
expressed in the haploid phase may not be affected by purg- 
ing due to effective haploid selection (Klekowski 1973, 1982; 
Lloyd 1974; McCauley et al. 1985; Taylor et al. 2007; Eppley 
et al. 2007; Billiard et al. 2012). Alternatively, if genes ex- 
pressed in both phases can also be affected by purging via 
homozygous exposure, intragametophytic selfing is expected 
to have a genome-wide effect because the majority of the 
genes is shared by the two phases (97%; Szovenyi et al. 
201 1, 2013). Therefore, no experimental evidence specifically 
addresses the question: Do intragametophytic selfers accumu- 
late a lower number of deleterious mutations on a genome- 
wide scale? 

To answer this question, we here use mosses as model 
systems for plants with a haploid-dominant life cycle capable 
of intragametophytic selfing. We compare the genome-wide 
accumulation of deleterious mutations in protein-coding 
genes between two pairs of moss species that either predom- 
inantly self or outcross. We find that selfing mosses accumu- 
late fewer deleterious mutations, and especially so in 
conserved positions of protein-coding genes. Thus, intraga- 
metophytic selfing in haploid-dominant plants can indeed fa- 
cilitate effective purging of deleterious mutations. 

Materials and Methods 

Study Species 

We use transcriptomes of two pairs of moss species to com- 
pare the number of deleterious mutations accumulated be- 
tween species that predominantly self or outcross. The first 



Genome Biol. Evol. 6(5): 1238-1 252. doi:10.1093/gbe/evu099 Advance Access publication May 14, 2014 



1239 



Szovenyi et al. 



GBE 



Selfer 

_J 



(D 



Outcrossing 



® 




Intragametophytic I ntergameto phytic 
selfing selfing 



Outcrosser 
I 



o 



2n 



In 



I ntergameto phytic Outcrossing 
selfing 



Fig. 1 . — Schematic representation of the mating system of selfer and outcrosser organisms with a haploid-dominant life cycle. Solid black arrows refer to 
meiotic events and the dashed horizontal line separates the haploid and diploid phases. Thickness of the gray arrows indicates the relative frequency of 
selfing and outcrossing events in selfers and outcrossers. 



pair comprises Physcomitrella patens and its relative Funaria 
hygrometrica. Both P. patens and F. hygrometrica have hap- 
loid-dominant life cycles, and their haploid individuals with 
combined sexes primarily undergo intragametophytic selfing 
(Shaw 1991; Eppleyetal. 2007; Perroud etal. 2011). Because 
they undergo more selfing, and a more extreme form of self- 
ing than the other species pairs, we refer to them as the 
"selfers. " Both species are genetically and functionally haploid 
and neither of them has an alio- or autopolyploid origin 
(Rensing et al. 2007; Shaw et al. 2010; Liu et al. 2012). 

Our pair of outcrossing organisms (hereinafter referred to 
as "outcrossers") includes two peat moss species Sphagnum 
subsecundum and Sphagnum cribrosum. Their haploid life 
stages can be either females or males, that is, individual 
plants can bear either male or female gametangia, but 
never both. They can outcross or undergo intergametophytic 
selfing, that is, fertilization between products of a single mei- 
osis, comparable to selfing in diploid plants (Shaw 2000). 
Although they may therefore experience some amount of 
selfing (Eppley et al. 2007; Szovenyi et al. 2009), we refer to 
them as outcrossers to distinguish them from the more ex- 
treme selfers P. patens and F. hygrometrica. Importantly, the 
divergence times between the selfer and outcrosser species 
pairs are similar (McDaniel et al. 2010; Shaw et al. 2010; Liu 
et al. 2012; Szovenyi et al. 2013). 

Available evidence suggests that the selfer and outcrosser 
species pairs have comparable range size and population 
structure. Both the selfer and the outcrosser pairs consist of 
a broadly and a more narrowly distributed species. The selfer 
F. hygrometrica is world-wide distributed, whereas P. patens 
occurs primarily in Europe. Similarly, the outcrosser 5. subse- 
cundum is widely distributed across the Northern Hemisphere 



whereas 5. cribrosum only occurs in the United States 
(Anderson et al. 2009). Both the selfer and the outcrosser 
species pairs have well-interconnected populations (via long- 
distance spore dispersal) which is evidenced by weak among- 
population genetic differentiation both at the local and global 
scales (Shaw 1991; Shaw et al. 2008; Szovenyi et al. 2008). 
Within-population genetic diversity is considerably higher in 
the outcrossers indicating greater effective population size 
of local populations (Shaw 1991; Shaw et al. 2008; Szovenyi 
et al. 2008). Finally, both species pairs are effective colonizers 
with small, temporary populations (Shaw 1991; Shaw et al. 
2008; Szovenyi et al. 2008; Anderson et al. 2009). Therefore, 
range size and population structures of the selfer and the 
outcrosser species pairs are comparable, rendering mating 
system differences the primary candidate factor to affect mu- 
tation purging. 

We focus our analysis on among-species divergence data 
because molecular traces of selfing are predicted to be more 
pronounced for among-species divergence data than for 
within-population polymorphism data when mating systems 
have been stable for a long time (Glemin 2007). Phylogenetic 
evidence suggests that the breeding systems of our selfer and 
outcrosser species pairs have indeed been stable, so the 
mating system had enough time to affect the accumulation 
of mutations (Shaw et al. 2010; Liu et al. 2012). 

Sequence Data and Read Filtering 

As a proxy of the incidence of mutation purging, we use the 
accumulation of likely deleterious mutations between selfer 
and outcrosser species pairs. To estimate the incidence of such 
mutations, we used high-throughput sequencing to generate 
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two genome-wide transcriptome data sets for this study 
(5. subsecundum and 5. cribrosum) and employed another 
such data set from our previous work (F. hygromerica) 
(Szovenyi et al. 2013). Briefly, for F. hygrometrica we had col- 
lected samples of three haploid and three diploid developmen- 
tal stages, extracted RNA, and performed sequencing on an 
lllumina GAIIx machine that generated 173,542,944 raw 
reads. We then discarded all reads containing low quality 
base calls (Q<20 and one or more undefined nucleotides 
[N], FASTQ Quality Trimmer, http://hannonlab.cshl.edu/fastx_ 
toolkit/, last accessed May 20, 2014) and/or showing adaptor 
contamination (Tagdust, false discovery rate threshold of 0.01 ; 
Lassmann et al. 2009). We used 133,851,302 quality-filtered 
reads in the final assembly. Detailed description of data collec- 
tion, read filtering and assembly are described in Szovenyi et al. 
(2013). Sequence data are available in the ArrayExpress data- 
base (www.ebi.ac.uk/arrayexpress, last accessed May 16, 
2014) under accession number E-MTAB-1664. 

For 5. subsecundum and 5. cribrosum, we used the 
Spectrum Plant Total RNA kit (Sigma) to extract total RNA 
from 100mg of gametophytic tissue (capitulum) obtained 
from 6-month-old cultivated plants originally collected in 
Swanville, Waldo County, Maine, USA. We note that at the 
time of collection no sporophytic tissues were available and 
thus we extracted RNA from purely gametophytic material. 
Because only few genes are expected to show sporophyte-spe- 
cific expression, differential tissue sampling in the selfer and 
outcrosser species pairs is unlikely to significantly bias our anal- 
yses (Szovenyi et al. 201 1 , 201 3). We subjected Poly-A selected 
RNA to paired-end sequencing (insert size: 300 bp) on one full 
lane and half a lane of an lllumina Hiseq2000 sequencer for 5. 
subsecundum and 5. cribrosum, respectively. The sequencing 
runs provided 163,776,364 and 94,818,996 raw paired end 
reads for 5. subsecundum and 5. cribrosum, respectively. Prior 
to assembly, we filtered out duplicate reads using an in-house 
Python script, trimmed the 3'-end of each read in a pair based 
on quality using the FASTX-Toolkit (FASTQ Quality Trimmer, 
http://hannonlab.cshl.edu/fastx_toolkit/, last accessed May 
16, 2014) with a quality score threshold of 25, clipped off 
lllumina sequencing adapters, when present, with the FASTX- 
Toolkit (FASTQ/A Clipper, http://hannonlab.cshl.edu/fastx_ 
toolkit/, last accessed May 16, 2014), and finally kept only 
paired-end reads for which both sequences were longer than 
40 bp after trimming and clipping. After these steps, 
1 1 7,836,947 and 81 ,91 7,656 cleaned paired end reads were 
left for 5. subsecundum and 5. cribrosum, respectively. Raw 
sequence data obtained are available in the ArrayExpress data- 
base (www.ebi.ac.uk/arrayexpress, last accessed May 16, 
2014) under accession number E-MTAB-2482. 

For the fourth species, P. patens, we retrieved high quality 
genomic data (Rensing et al. 2008) for transcript sequences 
and their corresponding protein translations from a public 
database (http://www.phytozome.net/version8. 0, last 
accessed October 8, 2013). 



Transcriptome Assembly 

We assembled the 5. subsecundum, 5. cribrosum, and F 
hygrometrica RNA-seq data into virtual transcripts using the 
assembler Trinity with default options (Grabherr et al. 201 1), 
which has high sensitivity and leads to the most contiguous 
transcriptome assembly, for example, the highest number of 
full-length transcripts for nonmodel species. In 5. subsecun- 
dum and 5. cribrosum, we assembled a total of 1 1 7,836,947 
and 81,917,656 cleaned paired end reads into 541,929 and 
508,499 contigs, respectively (371,788 and 261,128 "com- 
ponents" according to Grabherr et al. 201 1). In F. hygrome- 
trica, we obtained 133,851,302 cleaned single end reads and 
assembled them into 192,665 transcripts (102,856 "compo- 
nents" according to Grabherr et al. 201 1). 

Protein Translation, Orthology, and Pairwise Alignment of 
Orthologs 

In order to predict coding frame and to generate protein trans- 
lations, we compared all contigs of the assembly against the 
National Center for Biotechnology Information (NCBI) nr 
database plant division (ftp://ftp.ncbi.nlm.nih.gov/blast/db/nr. 
xx.tar.gz , last accessed May 1 0, 201 3) with the aid of BLASTX 
(Altschul et al. 1997). We filtered the Blast (ftp://ftp.ncbi. 
nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2. 
2.25+-ia32-linux.tar.gz, last accessed May 10, 2013; 
Altschul et al. 1997) output and kept only transcripts with 
BLAST hits showing at least 30% identity over at least 1 50 
amino acids. Using these filtering criteria, 21,101 (5. subse- 
cundum), 16,991 (5. cribrosum), and 34,1 19 (F. hygrometrica) 
transcripts had a valid match against the database. After that 
we provided each transcript and its best hit protein as input to 
the program GeneWise2 (http://www.ebi.ac.uk/~birney/ 
wise2/Wise2.4.0, last accessed March 8, 2013; Birney et al. 
2004), and used the best scoring protein-to-cDNA alignment 
to predict the encoding gene's ORF (Open Reading Frame) 
and generate a protein translation. We also discarded all tran- 
scripts with internal stop codons. The various filtering steps 
and the homology-guided translation ensured a high-confi- 
dence transcript set (we discarded transcripts without homol- 
ogy to the database) and helped exclude artifactual or 
contaminant sequences that are expected to occur in de 
novo transcriptome assemblies (Vijay et al. 2013). 

We next identified orthologous gene pairs from the sets of 
proteins thus identified in each species, and did so separately 
for the outcrosser (5. subsecundum vs. 5. cribrosum) and for 
the selfer species pair (F hygrometrica vs. P. patens). In order 
to establish orthology, we first kept the transcript with the 
longest protein translation for each putative gene (referred 
to as a "component" in the software Trinity). We then used 
the reciprocal best hits approach (Bork et al. 1 998) to establish 
orthologous protein pairs, keeping only pairs showing 30% 
identity over at least 150 amino acids in a BLASTP search 
(ftp://ftp.ncbi. nlm.nih.gov/blast/executables/blast+/LATEST/ 
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ncbi-blast-2.2.25+-ia32-linux.tar.gz, last accessed May 10, 
2013; Altschul et al. 1997). 

To generate DNA alignments of the coding sequence of 
orthologous proteins identified, we used Muscle (Edgar 
2004) with default parameters. After that we used PAL2NAL 
(Suyama et al. 2006) to map protein alignments onto nucleo- 
tide alignments. We used the "remove gaps" option of 
PAL2NAL in order to generate codon alignments without gaps. 

Estimating the Accumulation of Deleterious Mutations 

To determine whether deleterious mutations accumulate at 
different rates in selfers compared with outcrossers, we used 
three complementary approaches. 

First, we followed well-established precedence (Wright and 
Andolfatto 2008; Gossmann et al. 2010; Slotte et al. 2011; 
Yang and Gaut 201 1) by assuming that the majority of non- 
synonymous mutations is deleterious, and estimated their in- 
cidence relative to silent mutations. Specifically, we used our 
pairwise alignments to obtain estimates for the number of 
nonsynonymous substitutions per nonsynonymous sites (d/V), 
as well as the number of synonymous substitutions per synon- 
ymous (d5) sites, and computed their ratio d/V/d5 using the 
KaKs_Calculator (Zhang et al. 2006) with the YN (Yang and 
Nielsen) model of sequence evolution (Yang and Nielsen 
2000). We filtered the pairwise alignments by excluding 
entries with d5>2, d5=0, and d/V/d5>2. We used these 
thresholds because alignments with greater d5 values are 
likely to become saturated with synonymous substitutions 
(Axelsson et al. 2008; Mank et al. 2010). Furthermore, visual 
inspection indicated that alignments with d5 > 2 or 6N/6S > 2 
are frequently unreliable and thus cannot be trusted. 
Subsequently, we asked whether the d/V/d5 ratios differed 
between selfer and outcrosser species pairs. Furthermore, 
we also tested whether any such difference persists after ex- 
cluding genes potentially under positive selection. For the latter 
analysis, we kept only gene pairs with 6N/6S < 1 , following the 
method of Subramanian (2013). With this approach we tried 
to distinguish relaxed from positive selection, because both 
can accelerate the accumulation of nonsynonymous muta- 
tions and thus lead to elevated 6N/65 ratios. 

Different amino acids differ in their physicochemical prop- 
erties, such as size, charge, or aromaticity. Conservative amino 
acid changes, which alter an amino acid into one with similar 
physicochemical properties, are usually less detrimental to pro- 
tein function than radical changes, which alter these proper- 
ties (Zhang 2000; Eyre-Walker et al. 2002; Hughes and 
Friedman 2009). The ratio of the number of radical to conser- 
vative amino acid changes that a protein has experienced can 
thus be used as a proxy for the incidence of deleterious mu- 
tations (Zhang 2000; Eyre-Walker et al. 2002; Smith 2003). 
We used a complex classification of amino acids established by 
Miyata et al. (1979), which group amino acids into acidic, 
basic, neutral small, and neutral large categories. We treated 



all amino acid replacements that change an amino acid into 
one of a different category as radical, whereas we classified 
category-preserving changes as conservative. This classifica- 
tion has performed well in multiple studies and provides a 
reliable way to describe how strongly an amino acid change 
affects amino acid properties (Hanada et al. 2007; 
Wernegreen 2011). We estimated the proportion of radical 
(D r ) and conservative (D c ) amino acid changes using the hon- 
new program (Zhang 2000), taking into account the transi- 
tion/transversion bias for each pair of genes, as estimated by 
PAML (pairwise option, model: F3x4, kappa). Before calculat- 
ing these statistics, we again filtered alignments based on d/V/ 
d5 ratios, as explained above (0<d5<2, d/V/d5<2). 
Furthermore, we discarded all alignments with no amino 
acid change (D r = D c = 0) or with no radical change (D r = 
0). We also repeated the analysis by excluding genes that 
may be subject to positive selection, that is, by retaining 
only alignments with d/V/d5< 1 (Subramanian 2013). We de- 
termined significant differences in 6N/6S and D r /D c ratios 
among outcrossers and selfers with a Mann-Whitney U test 
(Sokal and Rohlf 2012). 

Finally, we also used a third approach to estimate the ac- 
cumulation of deleterious mutations. It relies on the observa- 
tion that the lower effective population size (A/ e ) potentially 
caused by selfing will allow slightly deleterious mutations ac- 
cumulate more rapidly, and thus lead to an elevated 6N/6S 
ratio between-species. Theory predicts that the effect of A/ e is 
more pronounced on sites that are under more intense puri- 
fying selection, where the average selection coefficient of 
deleterious mutations is greater (Subramanian 201 3). This pre- 
diction is a consequence of the theoretical relationship 
between co (d/V/d5), N e , and the selection coefficient (s), 
co = 5/(1-e -5 ) where 5 = 4A/ e xs (Kimura 1983; Nielsen 
and Yang 2003; Subramanian 2013). 

Because the rate at which mutations accumulate depends 
on the intensity of selection, the number of accumulated del- 
eterious mutations is expected to be lower in conserved than 
in nonconserved parts of proteins (s conserved > s nonconserved ). 
Furthermore, the relationship among A/ e , s, and co predicts 
that a unit increase in purifying selection (s) will result in a 
steeper decrease in co in the species pair with the larger effec- 
tive population size (N e ). That is, given that A/ e ( se ifer) consider- 
ably differs from A/ e (outcrosser) the difference in the number of 
accumulated deleterious mutations between outcrossers and 
selfers should be more pronounced in conserved parts of pro- 
teins, which are under more intense purifying selection. 
Furthermore, differences in the d/V/d5 and D r /D c ratios be- 
tween conserved and nonconserved regions within genes 
should also differ between outcrossers and selfers. The 
effect of purging via homozygous exposure should be also 
more pronounced in conserved than in nonconserved do- 
mains of proteins, because deleterious mutations with large 
effect can be more effectively purged (Wang et al. 1999). For 
example, if purging is more efficient in selfers, we expect to 
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see not only lower d/V/d5and D r /D c ratios in selfers but also the 
differences to outcrossers should be more pronounced in con- 
served protein regions. Furthermore, d/V/d5 and D r /D c ratios 
between conserved and nonconserved regions within genes 
should differ more markedly in selfers if they experience more 
effective purging. 

To perform this analysis, we searched protein translations 
of assembled transcripts against the conserved domain 
database (Marchler-Bauer and Bryant 2004; Marchler-Bauer 
et al. 201 3) to identify highly conserved regions of the genes in 
our data set. We used the default search parameters and kept 
only the best match for each protein. After that, we merged 
multiple conserved regions per protein into a contiguous 
stretch of amino acids, likewise for nonconserved regions. 
To ensure that our analysis is based on sufficiently many 
amino acid changes, we only used those pairwise alignments 
where each protein contained a conserved and nonconserved 
region of at least 100 amino acids each. 

Furthermore, as in the previous analyses, we discarded 
uninformative or unreliable protein pairs (d5=0, d5> 2, d/V/ 
d5>2). Finally, we used the Ka/Ks estimator (Zhang et al. 
2006) and the hon-new program (Zhang 2000) to obtain 
the ratios 6N/6S and D r /D c , and did so separately for the con- 
served and nonconserved parts of each gene. We determined 
significant differences in d/V/d5 and D r /D c ratios among out- 
crossers and selfers for conserved and nonconserved regions 
separately with a Mann-Whitney U test (Sokal and 
Rohlf 2012). One limitation of this test is that we treat con- 
served and nonconserved regions as independent entities. 
This is because no meaningful pairing exists when analyz- 
ing the total data set. However, dependency of conserved 
and nonconserved regions was fully accounted for 
(Wilcoxon matched pairs tests, see below) when analyzing 
the one-to-one ortholog data set. We also used Wilcoxon 
matched pairs tests (Sokal and Rohlf 2012) to determine 
whether conserved and nonconserved regions (the matched 
pairs) within each gene showed significantly different 6N/6S 
and D r /D c ratios for each species pair separately. We per- 
formed all statistical analyses in R (R Development Core 
Team 2011). 

Comparing functionally different gene sets could obscure 
the effect of mating system on the evolutionary rate of genes. 
To account for this bias, we identified proteins that showed 
one-to-one orthology across all four of our study species be- 
cause orthologous proteins tend to be more similar in function 
than paralogs (Altenhoff et al. 2012). We identified orthologs 
using the algorithm provided in ProteinOrtho (Lechner et al. 
201 1) and repeated all the above analyses on this data set. To 
establish orthology across the four species gene set, we exe- 
cuted ProteinOrtho using all predicted proteins of 5. subse- 
cundum, S. cribrosum, F hygrometrica, and P. patens (http:// 
www.phytozome.net/version 8.0, last accessed October 8, 
2013) with the following parameters: e value threshold of 
10 -4 , 30% identity, adaptive best alignments similarity of 



f=0.95, algebraic connectivity > 0.1. Subsequently, we kept 
only those co-orthologs containing one single protein per spe- 
cies in order to reduce the risk of including paralogous rela- 
tionships. Finally, we used the four-way orthology relationship 
to identify corresponding pairwise orthologous genes (5. sub- 
secundum vs. 5. cribrosum; F hygrometrica vs. P. patens). 

Results 

Selfers Purge Deleterious Mutations Better than 
Outcrossers 

Our analysis of complete quality-filtered transcriptomes (see 
Materials and Methods) identified 10,038 orthologous gene 
pairs in the selfer species (F hygrometrica and P. patens) and 
6,089 orthologous gene pairs in the outcrosser species pair 
(5. subsecundum and 5. cribrosum) (table 1). 

Overall, we found that the mean 6N/6S ratios among 
orthologous genes in both species pairs fell into the interval 
0. 1 -0.4, a range that is typical for such genome-wide analyses 
(Slotte et al. 201 1 ; Yang and Gaut 201 1 ). Importantly, the d/V/ 
65 ratio in the outcrosser species pair was twice as high than in 
the selfer species pair, and this difference was statistically 
significant, according to a Mann-Whitney U test (fig. 2 and 
table 1; z df=1 = 53.48, P< 0.00001). After removing gene 
pairs that may be subject to positive selection (d/V/d5> 1), 
the 6N/6S ratio was still much higher in the outcrosser species 
(table 1; z df=1 = 50.01, P<0.00001). Repeating the above 
analyses on the data set containing only genes that are 
one-to-one orthologous across our four study species led to 
the very same conclusions (supplementary table S1, 
Supplementary Material online). In sum, deleterious mutations 
are purged more efficiently from the genomes of the selfer 
species. 

Selfers Purge Radical Amino Acid Changes More 
Efficiently than Outcrossers 

Many radical changes are deleterious and eliminated by nat- 
ural selection (Zhang 2000). In a complementary analysis 
which focused on the ratio of radical to conservative amino 
acid changes D/D c , we found that the D r /D c ratio was about 
36% greater in the outcrosser than in the selfer species pair 
(fig. 3 and table 1; z df=1 =25.62, P< 0.00001). We repeated 
this analysis by excluding those genes that may be subject to 
positive selection (6N/6S > 1) which led to the same outcome 
(table 1, selfer vs. outcrosser z df=1 =23.59, P<0.00001). 
Finally, repeating these analyses on one-to-one orthologs led 
to the very same conclusions (supplementary table S1, 
Supplementary Material online). Analysis of radical and con- 
served amino acid changes between selfer and outcrosser 
species pairs thus shows that selfers are more efficient in purg- 
ing deleterious mutations than outcrossers, which is consistent 
with our observations based on 6N/6S ratios. 
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Intragametophytic Selfing and the Genetic Load 



GBE 




Selfer 



Outcrosser 



Fig. 2. — Ratio of nonsynonymous mutations per nonsynonymous 
sites to synonymous mutations per synonymous sites (d/V/dS, mean, and 
SE) in the selfer and in the outcrosser species pairs. d/V/d5 ratios are sig- 
nificantly greater in the outcrosser than in the selfer species pair 
(P< 0.00001). 




Selfer 



Outcrosser 



Fig. 3. — Ratio of radical to conserved amino acid changes (DJD Cl 
mean, and SE) in the selfer and in the outcrosser species pairs. D r /D c 
ratios are highly significantly greater in the outcrosser than in the selfer 
species pair (P< 0.00001). 



Comparison of Conserved and Nonconserved Stretches 
of Genes 

In our next analysis, we contrasted 6N/6S or D r /D c ratios in 
conserved and nonconserved regions of genes (see Materials 
and Methods). We first investigated whether the difference in 
6N/6S values between conserved and nonconserved regions 
within each gene differs more considerably in selfers than in 
outcrossers, which would imply more effective purging in self- 
ers. We found that 6N/6S ratios were smaller in conserved 
than in nonconserved regions of genes in both the selfer 



and the outcrosser species pairs. Importantly, in selfing spe- 
cies, this difference was more pronounced (table 2 and fig. 4). 
Specifically, mean 6N/6S ratios were on average 116% 
greater in nonconserved regions of selfers (z s[conserved vs . noncon- 
served] = 40.03, P< 0.00001), and only 73% greater in out- 
crossers (z 0 [conserved vs. nonconserved] = 21 .97, P< 0.00001), 

which is also evident from the z scores of the Wilcoxon 
matched pairs test (z s[conserved vs . nonconserved] = 40.03, 
z 0 [conserved vs. nonconserved] = 22.33; table 2 and fig. 4). We 
repeated this analysis after exclusion of genes that may have 
been subject to positive selection (d/V/d5> 1), which leads to 
the same conclusion (table 2; z s[conserved vs . nonconserved] = 

40. 03, P < 0 . 00001; Z 0 [ CO nserved vs. nonconserved] = 22.17, 

P< 0.00001), as did the same analysis for one-to-one ortho- 
logous genes (supplementary table S2, Supplementary 
Material online). 

Because the ratio of radical to conservative amino acid 
changes (D r /D c ) also reflects mutational purging, we reasoned 
that it should show a pattern similar to d/V/d5 in conserved and 
nonconserved gene regions (table 3 and fig. 5). Indeed, the 
proportion of radical amino acid changes was significantly 
smaller (15%) in conserved regions for selfers (z S [ Conserved vs 
nonconserved] = 18.38, P< 0.00001). This was also true for out- 
crossers but to a significantly lesser extent (1 0%, z 0 [ CO nserved vs. 
nonconserved] = 5.51, P< 0.00001). This conclusion did not 
change when we excluded genes that may have been subject 
to positive selection (d/V/d5> 1; table 3; z s[conserved vs . noncon- 
served] — 1 8.38, P < 0.00001 ; Zo[ C onserved vs. nonconserved] =5.36, 

P< 0.00001) and when we restricted our analysis for one-to- 
one orthologous genes (supplementary table S3, 
Supplementary Material online). Altogether, this means that 
conserved regions accumulate fewer radical amino acid 
changes in the selfers than in the outcrossers. This supports 
the hypothesis that purging of deleterious mutations through 
homozygous exposure is more efficient in selfers than in 
outcrossers. 

The preceding analysis focused on differences in d/V/d5and 
D r /D c ratios among conserved and nonconserved regions 
"within" genes for the selfer and outcrosser species pairs. 
The following analysis focuses on analogous predictions "be- 
tween" species pairs. We found that the d/V/d5 ratio was 
smaller in the selfer than in the outcrosser species pair and 
this difference was more pronounced in conserved gene re- 
gions (table 2 and fig. 4). Specifically, in the conserved do- 
mains of genes, the d/V/d5 ratio was approximately 1 1 6% (P s 
vs 0 < 0.00001) higher in the outcrosser than in the selfer 
species pair. In contrast, in nonconserved regions of genes 
the difference in 6N/6S ratios between selfer species and 
outcrosser species, while significant (P s vs . o < 0.00001), was 
less pronounced (73% greater in outcrosser species than 
in selfing species). Repeating this analysis by excluding 
genes that may have been subject to positive selection (d/V/ 
d5> 1) did not affect this observation (table 2; 6N/6S in con- 
served stretches: P s vs. o< 0.00001; 6N/6S in nonconserved 
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Selfer Outcrosser 



Fig. 4. — Ratio of nonsynonymous mutations per nonsynonymous 
sites to synonymous mutations per synonymous sites (d/V/dS, mean, and 
SE) for conserved and nonconserved domains of genes in the selfer and in 
the outcrosser species pairs. d/V/d5 ratios in conserved and nonconserved 
domains of genes are highly significantly different both within the selfer 
and the outcrosser species pairs studied (P< 0.00001). However, the dif- 
ference between conserved and nonconserved domains is less pro- 
nounced in the outcrosser comparison. Finally, d/V/d5 ratios are 
significantly (P< 0.00001) greater in the outcrosser than in the selfer spe- 
cies pair both in conserved and in nonconserved domains with a greater 
effect in conserved domains. 

stretches: P s vs . 0 < 0.001 ), and neither did the analysis of one- 
to-one orthologous gene sets (supplementary table S2, 
Supplementary Material online). 

Finally, we performed an analogous analysis for the ratio of 
radical to conservative amino acid changes (D r /D c ). Again we 
found that the difference in D r /D c values between selfer and 
outcrosser species pairs was more pronounced in conserved 
than in nonconserved regions of genes (table 3 and fig. 5). 
Specifically, D r /D c ratios in conserved regions of genes were 
approximately 26% (P s vs . 0 < 0.00001 ) greater in the outcros- 
ser species pair and this difference was statistically highly sig- 
nificant. In contrast, in nonconserved regions of genes D r /D c 
values were only approximately 18% (P s vs Q < 0.0001) 
greater between outcrossers and selfers. This observation 
was not affected by excluding genes that may have been sub- 
ject to positive selection (table 3; D r /D c in conserved stretches: 

vs. o< 0.00001; D r /D c in nonconserved stretches: P s vs . 
0 < 0.00001), nor by considering only one-to-one orthologs 
(supplementary table S3, Supplementary Material online). All 
these observations imply that deleterious mutations are more 
effectively purged in the selfers than in the outcrossers. 

Discussion 

Selfing has two opposing effects on the efficacy of purifying 
selection. On the one hand, selfing reduces effective 



population size, because it increases genome-wide homozy- 
gosity (Nordborg 2000) and lowers effective recombination 
rates (Wright et al. 2008, 2013). These effects reduce the 
efficacy of purifying selection. On the other hand, selfing ex- 
poses recessive deleterious mutations in a homozygous state 
and can thus also enhance purging (Glemin 2007). In selfers 
with a diploid-dominant life cycle the first effect dominates, 
thus leading to a net reduction in purifying selection (Slotte 
et al. 2013; Wright et al. 2013). Here we show that in plants 
with a haploid-dominant life cycle undergoing intragameto- 
phytic selfing, an extreme form of selfing, homozygous expo- 
sure of deleterious mutations can lead to enhanced purging 
on a genome-wide scale. 

This assertion is supported by three lines of evidence. First, 
we found that selfers accumulate fewer nonsynonymous mu- 
tations (6N/6S) and radical amino acid replacements (D/D c ) 
than outcrossers. Second, within genes, significantly fewer 
deleterious mutations (d/V/d5 and D r /D c ) accumulate in con- 
served than in nonconserved regions, and this effect is greater 
in selfers than in outcrossers. Finally, by contrasting the accu- 
mulation of deleterious mutations (estimated again through 
6N/6S and D r /D c ) among the two species pairs, we found that 
the difference between selfers and outcrossers is more pro- 
nounced in conserved than in nonconserved gene regions. 

Previous studies provided indirect evidence that intragame- 
tophytic selfing can facilitate purging, but did not directly 
assess the effect of intragametophytic selfing on the accumu- 
lation of deleterious mutations at the level of genes and ge- 
nomes. For instance, they found a lower incidence of lethality 
in homozygous embryos of ferns and a lack of inbreeding 
depression in homozygous sporophytes of mosses capable 
of intragametophytic selfing (Klekowski 1973, 1982; Lloyd 
1974; Hedrick 1987a, 1987b; Taylor et al. 2007). By studying 
constrained sequence evolution on a genome-wide scale, our 
study goes beyond previous phenotypic observations and pro- 
vides direct evidence that purifying selection under intragame- 
tophytic selfing can balance and even outweigh the effect of 
effective population size reduction caused by selfing (Hedrick 
1987a, 1987b; Holsinger 1987). Furthermore, our observa- 
tions also suggest that the correlation between intragameto- 
phytic selfing and a lack of both homozygous lethality and 
inbreeding depression may be explained by a decreased accu- 
mulation of deleterious mutations in the genome. Any direct 
connection between deleterious mutations and observed phe- 
notypic effects, however, has to be experimentally demon- 
strated in future work. 

It is generally assumed that the potential masking of dele- 
terious mutations is restricted to diploid-specific genes of a 
plant's lifecycle. The reason is that deleterious mutations in 
haploid-specific genes (as well as in genes needed in both 
stages) are always exposed to selection. Previous work 
showed that only approximately 2% of genes expressed in 
F. hygrometrica and P. patens show diploid-specific expression 
(Szovenyi etal. 201 1; O'Donoghue etal. 2013). Such a minute 
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Fig. 5. — Ratio of radical to conserved amino acid changes (D r /D c , 
mean, and SE) for conserved and nonconserved domains of genes in 
the selfer and in the outcrosser species pairs. D/D c ratios in conserved 
and nonconserved domains of genes are significantly different both 
within the selfer (P< 0.00001) and in the outcrosser species pairs 
(P< 0.00001). In both conserved and nonconserved domains, D r /D c 
ratios are significantly greater in the outcrosser (P< 0.00001) than in the 
selfer species pair. Nevertheless, this difference is more pronounced in the 
conserved than in the nonconserved domain stretches. 



fraction of the genome subject to effective purging by intra- 
gametophytic selfing might go undetected. However, the sig- 
nificant effect of intragametophytic selfing on purging we 
observed shows otherwise. Together with our previous finding 
that diploid- and haploid-specific genes experience similar ef- 
ficacy of purging (Szovenyi et al. 2013), it contradicts the 
notion that only diploid-specific genes are affected by intra- 
gametophytic selfing. 

To explain this apparent contradiction, we note that a 
gene's exposure to selection in the haploid phase does not 
necessarily lead to more effective selection (Gerstein et al. 
2011; Gerstein and Otto 2011; Billiard et al. 2012; 
Arunkumar et al. 2013; Szovenyi et al. 2013; Zorgo et al. 
2013). Therefore, contrary to previous assumptions, genes 
expressed in both phases can experience the effect of intra- 
gametophytic selfing on purging. One possible mechanism is 
that homozygous exposure of deleterious mutations via intra- 
gametophytic selfing may extend the time span over which 
deleterious mutations are directly exposed to selection (once 
as haploids and once as diploid homozygotes). Deleterious 
mutations, which may escape effective purifying selection in 
the haploid stage, may no longer escape natural selection 
when exposed in both stages. Moreover, the same mutation 
may have different deleterious effects in haploid and diploid 
stages, and their compound effect may be more severe, which 
can further contribute to its efficient purging. Because existing 
theory does not fully account for complications like this, more 



work is needed to fully understand the evolutionary dynamics 
of deleterious mutations in organisms with complex life cycles. 

Because our findings are in stark contrast to the observa- 
tion that selfing leads to reduced selection efficacy (Glemin 
and Galtier 2012), it is important to address the effect of 
confounding factors that may bias our analyses. The first of 
two major factors is that selfer and outcrosser species pairs 
may express different gene sets that could bias our analysis 
(Yang and Gaut 2011). This gene set bias may be partially 
caused by differential tissue sampling in the selfer and out- 
crosser species pairs: RNA was obtained only from garmeto- 
phytic tissues for the outcrosser whereas both gametophytic 
and sporophytic tissues were used for the selfer species pair. 
For this reason we repeated our analyses with genes that were 
one-to-one orthologous across all four study species, which 
yielded the same conclusions (see supplementary material, 
Supplementary Material online). Therefore, gene set bias 
cannot explain the differences we see between the selfer 
and the outcrosser species pairs. 

Second, it is well-known that estimates of synonymous di- 
vergence d5 (a measure of divergence time) and d/V/d5 are 
negatively correlated. That is, d/V/d5 values are higher be- 
tween recently diverged species, because intraspecific poly- 
morphisms contribute prominently to d/V/d5 (Rocha et al. 
2006; Kryazhimskiy and Plotkin 2008; Wolf et al. 2009; 
Mugal et al. 2014). Being aware of this issue, we selected 
species pairs (selfer and outcrosser) that exhibit similar diver- 
gence times. More specifically, both the outcrosser and the 
selfer species pairs are estimated to have diverged about 14- 
20 Ma (Shaw et al. 2010; Szovenyi et al. 2013) which corre- 
sponds to an approximate divergence time of 23-33 A/ e gen- 
erations (assuming an A/ e of 600,000 [McDaniel, van Baren 
etal. 2013] and a mutation rate of 1.8 x 10" 8 [Rensing etal. 
2007]). Previous work suggests that the contribution of poly- 
morphisms to 6N/6S estimates within this time range is small 
and the overall influence of divergence time on total 6N/6S is 
thus weak (Mugal et al. 2014). Because this argument de- 
pends on the parameter estimates used, we also conducted 
a second, complementary analysis. Specifically, we divided 
gene alignments into five categories according to their d5 
values in both species pairs and compared d/V/d5 values for 
each category separately. We found that d/V/d5 ratios were 
significantly greater for the outcrosser species pair in all d5 
categories investigated, both for the full and for the one-to- 
one ortholog data set (supplementary table S4, 
Supplementary Material online). All this suggests that the dif- 
ference in d/V/d5 ratios between selfers and outcrossers is not 
caused by the difference in synonymous divergence. 

Conclusions, Limitations, and Open 
Questions 

Although our study provides evidence that purging under 
intragametophytic selfing can be more effective than under 
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intergametophytic selfing, multiple questions remain to be 
resolved. First, our current analysis is based on the comparison 
of one selfer and one outcrosser species pair. Therefore, with 
the current data set in hand it is difficult to assess the gener- 
ality of our findings. Including multiple selfer and outcrosser 
lineages in future studies will be necessary to test the gener- 
ality of our findings. Second, the extent to which diploid- 
specific genes or genes expressed in both ploidal phases are 
affected by purging has to be determined. This is critical to 
understand the evolutionary mechanism of purging. Third, our 
data suggest that intragametophytic selfing is advantageous, 
because it can efficiently prevent the accumulation of delete- 
rious mutations in organism with extreme selfing. Theory pre- 
dicts, however, that extreme linkage disequilibrium 
established by repeated events of intragametophytic selfing 
may hinder adaptive evolution (Felsenstein 1974; Birky and 
Walsh 1988; Charlesworth 2012). Nevertheless, McDaniel, 
Atwood et al. (2013) showed that hermaphroditic lineages 
of haploid-dominant plants have an elevated diversification 
rate compared with dioecious lineages. Therefore, the better 
purging in haploid selfers may also have long-term macroevo- 
lutionary consequences that need to be explored further. 
Thus, the extent to which intragametophytic selfing constrains 
the adaptive potential of species remains to be determined. 

Supplementary Material 

Supplementary tables S1-S4 are available at Genome Biology 
and Evolution online (http://www.gbe.oxfordjournals.org/). 
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